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A series of papers has developed a statistical mechanics of neocortical interactions (SMNI), deriving 
aggregate behavior of experimentally observed columns of neurons from statistical electrical-chemical 
properties of synaptic interactions. While not useful to yield insights at the single neuron level, SMNI has 
demonstrated its capability in describing large-scale properties of short-term memory and 
electroencephalographic (EEG) systematics. The necessity of including nonlinear and stochastic 
structures in this development has been stressed. Sets of EEG and evoked potential data were fit, 
collected to investigate genetic predispositions to alcoholism and to extract brain "signatures" of short- 
term memory. Adaptive Simulated Annealing (ASA), a global optimization algorithm, was used to 
perform maximum likelihood fits of Lagrangians defined by path integrals of multivariate conditional 
probabilities. Canonical momenta indicators (CMI) are thereby derived for individual's EEG data. The 
CMI give better signal recognition than the raw data, and can be used to advantage as correlates of 
behavioral states. These results give strong quantitative support for an accurate intuitive picture, 
portraying neocortical interactions as having common algebraic or physics mechanisms that scale across 
quite disparate spatial scales and functional or behavioral phenomena, i.e., describing interactions among 
neurons, columns of neurons, and regional masses of neurons. 
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1. INTRODUCTION 

A model of statistical mechanics of neocortical interactions (SMNI) has been developed [1-20], 
describing large-scale neocortical activity on scales of mm to cm as measured by scalp EEG, with an 
audit trail back to minicolumnar interactions among neurons. There are several aspects of this modeling 
that should be further investigated: to explore the robustness of the model, the range of experimental 
paradigms to which it is applicable, and further development that can increase its spatial resolution of 
EEG. 

The underlying mathematical physics used to develop SMNI gives rise to a natural coordinate system 
faithful to nonlinear multivariate sets of potential data such as measured by multi-electrode EEG, 
canonical momenta indicators (CMI) [20-22]. Recent papers in finance [21,22] and in EEG systems [20] 
have demonstrated that CMI give enhanced signal resolutions over raw data. 

The basic philosophy of SMNI is that good physical models of complex systems, often detailed by 
variables not directly measurable in many experimental paradigms, should offer superior descriptions of 
empirical data beyond that available from black-box statistical descriptions of such data. For example, 
good nonlinear models often offer sound approaches to relatively deeper understandings of these systems 
in terms of synergies of subsystems at finer spatial-temporal scales. 

In this context, a generic mesoscopic neural network (MNN) has been developed for diffusion-type 
systems using a confluence of techniques drawn from the SMNI, modern methods of functional stochastic 
calculus defining nonlinear Lagrangians, adaptive simulated annealing (ASA) [23], and parallel- 
processing computation, to develop a generic nonlinear stochastic mesoscopic neural network 
(MNN) [14,19]. MNN increases the resolution of SMNI to minicolumnar interactions within and 
between neocortical regions, a scale that overlaps with other studies of neural systems, e.g., artificial 
neural networks (ANN). 

In order to interface the algebra presented by SMNI with experimental data, several codes have been 
developed. A key tool is adaptive simulated annealing (ASA), a global optimization C-language 
code [23-27]. Over the years, this code has evolved to a high degree of robustness across many 
disciplines. However, there are over 100 OPTIONS available for tuning this code; this is as expected for 
any single global optimization code applicable to many classes of nonlinear systems, systems which 
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typically are non-typical. 

Section 2 gives the background used to develop SMNI and ASA for the present study Appendix A gives 
more detail on ASA relevant to this paper. Section 3 gives the mathematical development required for 
this study. Section 4 describes the procedures used. Section 5 presents conclusions. 

2. BACKGROUND 
2.1. EEG 

The SMNI approach develops mesoscopic scales of neuronal interactions at columnar levels of hundreds 
of neurons from the statistical mechanics of relatively microscopic interactions at neuronal and synaptic 
scales, poised to study relatively macroscopic dynamics at regional scales as measured by scalp 
electroencephalography (EEG). Relevant experimental confirmation is discussed in the SMNI papers at 
the mesoscopic scales as well as at macroscopic scales of scalp EEG. The derived firings of columnar 
activity, considered as order parameters of the mesoscopic system, develop multiple attractors, which 
illuminate attractors that may be present in the macroscopic regional dynamics of neocortex. SMNI 
proposes that models to be fitted to data include models of activity under each electrode, e.g., due to 
short-ranged neuronal fibers, as well as models of activity across electrodes, e.g., due to long-ranged 
fibers. These influences can be disentangled by SMNI fits. 

The SMNI approach is complementary to other methods of studying nonlinear neocortical dynamics at 
macroscopic scales. For example, EEG and magnetoencephalography (MEG) data have been expanded in 
a series of spatial principal components, a Karhunen-Loeve expansion. The coefficients in such 
expansions are identified as order parameters that characterize phase changes in cognitive studies [28,29] 
and epileptic seizures [30,31]. However, the SMNI CMI may be considered in a similar context, as 
providing a natural coordinate system that can be sensitive to experimental data, without assuming 
averages over stochastic parts of the system that may contain important information. 

Theoretical studies of the neocortical medium have involved local circuits with postsynaptic potential 
delays [32-35] , global studies in which finite velocity of action potential and periodic boundary conditions 
are important [36-39], and nonlinear nonequilibrium SMNI. The local and the global theories combine 
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naturally to form a single theory in which control parameters effect changes between more local and more 
global dynamic behavior [39,40], in a manner somewhat analogous to localized and extended wave- 
function states in disordered solids. 

Plausible connections between the multiple-scale statistical theory and the more phenomenological global 
theory have been proposed [12]. Experimental studies of neocortical dynamics with EEG include maps of 
magnitude distribution over the scalp [37,41], standard Fourier analyses of EEG time series [37], and 
estimates of correlation dimension [42,43]. Other studies have emphasized that many EEG states are 
accurately described by a few coherent spatial modes exhibiting complex temporal 
behavior [28-31,37,39]. These modes are the order parameters at macroscopic scales that underpin the 
phase changes associated with changes of physiological state. 

For extracranial EEG, it is clear that spatial resolution, i.e., the ability to distinguish between two dipole 
sources as their distance decreases, is different from dipole localization, i.e., the ability to locate a single 
dipole [39]. The development of methods to improve the spatial resolution of EEG has made it more 
practical to study spatial structure. For example, high resolution methods provide apparent spatial 
resolution in the 2-3 cm range [44]. Dipole resolution may be as good as several mm [45]. Some 
algorithms calculate the (generally non-unique) inverse-problem of determining cortical sources that are 
weighted/filtered by volume conductivities of concentric spheres encompassing the brain, cerebrospinal 
fluid, skull, and scalp. A straightforward approach is to calculate the surface Laplacian from spline fits to 
the scalp potential distribution, yielding estimates similar to those obtained using concentric spheres 
models of the head [44]. Other measuring techniques, e.g., MEG, can provide complementary 
information. These methods have their strengths and weaknesses at various spatial-temporal frequencies. 

These source localization methods typically do not include in their models the synergistic contributions 
from short-ranged columnar firings of mm spatial extent and from long-ranged fibers spanning cm spatial 
extent. The CMI study presented here models these synergistic short-ranged and long-ranged 
interactions. This is elaborated on in the conclusion. 

2.2. Short-Term Memory (STM) 

The development of SMNI in the context of short-term memory (STM) tasks leads naturally to the 
identification of measured electric scalp potentials as arising from excitatory and inhibitory short-ranged 
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and excitatory long-ranged fibers as they contribute to minicolumnar interactions [12,13]. Therefore, the 
SMNI CMI are most appropriately calculated in the context of STM experimental paradigms. It has been 
been demonstrated that EEG data from such paradigms can be fit using only physical synaptic and 
neuronal parameters that lie within experimentally observed ranges [13,20]. 

The SMNI calculations are of minicolumnar interactions among hundreds of neurons, within a 
macrocolumnar extent of hundreds of thousands of neurons. Such interactions take place on time scales 
of several z, where r is on the order of 10 msec (of the order of time constants of cortical pyramidal 
cells). This also is the observed time scale of the dynamics of STM. SMNI hypothesizes that columnar 
interactions within and/or between regions containing many millions of neurons are responsible for these 
phenomena at time scales of several seconds. That is, the nonlinear evolution at finer temporal scales 
gives a base of support for the phenomena observed at the coarser temporal scales, e.g., by establishing 
mesoscopic attractors at many macrocolumnar spatial locations to process patterns in larger regions. 

SMNI has presented a model of STM, to the extent it offers stochastic bounds for this phenomena during 
focused selective attention [4,6,15,46-48], transpiring on the order of tenths of a second to seconds, 
limited to the retention of 7 + 2 items [49]. These constraints exist even for apparently exceptional 
memory performers who, while they may be capable of more efficient encoding and retrieval of STM, and 
while they may be more efficient in "chunking" larger patterns of information into single items, 
nevertheless are limited to a STM capacity of 7 + 2 items [50] . Mechanisms for various STM phenomena 
have been proposed across many spatial scales [51]. This "rule" is verified for acoustical STM, as well as 
for visual or semantic STM, which typically require longer times for rehearsal in an hypothesized 
articulatory loop of individual items, with a capacity that appears to be limited to 4 + 2 [52]. SMNI has 
detailed these constraints in models of auditory and visual cortex [4,6,15,16]. 

Another interesting phenomenon of STM capacity explained by SMNI is the primacy versus recency 
effect in STM serial processing [6], wherein first-learned items are recalled most error-free, with last- 
learned items still more error- free than those in the middle [53]. The basic assumption being made is that 
a pattern of neuronal firing that persists for many r cycles is a candidate to store the "memory" of 
activity that gave rise to this pattern. If several firing patterns can simultaneously exist, then there is the 
capability of storing several memories. The short-time probability distribution derived for the neocortex 
is the primary tool to seek such firing patterns. The deepest minima are more likely accessed than the 
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others of this probability distribution, and these valleys are sharper than the others. I.e., they are more 
readily accessed and sustain their patterns against fluctuations more accurately than the others. The more 
recent memories or newer patterns may be presumed to be those having synaptic parameters more 
recently tuned and/or more actively rehearsed. 

It has been noted that experimental data on velocities of propagation of long-ranged fibers [37,39] and 
derived velocities of propagation of information across local minicolumnar interactions [2] yield 
comparable times scales of interactions across minicolumns of tenths of a second. Therefore, such 
phenomena as STM likely are inextricably dependent on interactions at local and global scales. 

2.2.1. SMNI&ADP 

A proposal has been advanced that STM is processed by information coded in approximately 40-Hz 
(approximately 2.5 foldings of r) bursts per stored memory, permitting up to seven such memories to be 
processed serially within single waves of lower frequencies on the order of 5 to 12 Hz [54]. To account 
for the observed duration of STM, they propose that observed after-depolarization (ADP) at synaptic sites, 
affected by the action of relatively long-time acting neuromodulators, e.g., acetylcholine and serotonin, 
acts to regularly "refresh" the stored memories in subsequent oscillatory cycles. A recent study of the 
action of neuromodulators in the neocortex supports the premise of their effects on broad spatial and 
temporal scales [55], but the ADP model is much more specific in its proposed spatial and temporal 
influences. 

SMNI does not detail any specific synaptic or neuronal mechanisms that might refresh these most-likely 
states to reinforce multiple short-term memories [18]. However, the calculated evolution of states is 
consistent with the observation that an oscillatory subcycle of 40 Hz may be the bare minimal threshold of 
self-sustaining minicolumnar firings before they begin to degrade [16]. 

The mechanism of ADP details a specific synaptic mechanism that, when coupled with additional 
proposals of neuronal oscillatory cycles of 5-12 Hz and oscillatory subcycles of 40 Hz, can sustain these 
memories for longer durations on the order of seconds. By itself, ADP does not provide a constraint such 
as the 7+2 rule. The ADP approach does not address the observed random access phenomena of these 
memories, the 4±2 rule, the primacy versus recency rule, or the influence of STM in observed EEG 
patterns. 
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SMNI and ADP models are complementary to the understanding of STM. MNN can be used to overlap 
the spatial scales studied by the SMNI with the finer spatial scales typically studied by other relatively 
more microscopic neural networks. At this scale, such models as ADP are candidates for providing an 
extended duration of firing patterns within the microscopic networks. 

2.2.2. PATHINT 

A path-integral C-language code, PATHINT, calculates the long-time probability distribution from the 
Lagrangian, e.g., as fit by the ASA code. A robust and accurate histogram-based (non-Monte Carlo) path- 
integral algorithm to calculate the long-time probability distribution had been developed to handle 
nonlinear Lagrangians [56-58], which was extended to two-dimensional problems [59]. PATHINT was 
developed for use in arbitrary dimensions, with additional code to handle general Neumann and Dirichlet 
conditions, as well as the possibility of including time-dependent potentials, drifts, and diffusions. The 
results of using PATHINT to determine the evolution of the attractors of STM give overall results 
consistent with previous calculations [15,16]. 

2.3. ASA 

In order to maintain some audit trail from large-scale regional activity back to mesoscopic columnar 
dynamics, desirable for both academic interest as well as practical signal enhancement, as few 
approximations as possible are made by SMNI in developing synaptic interactions up to the level of 
regional activity as measured by scalp EEC This presents a formidable multivariate nonlinear 
nonequilibrium distribution as a model of EEG dynamics, a concept considered to be quite tentative by 
research panels as late as 1990, until it was demonstrated how fits to EEG data could be 
implemented [13]. 

In order to fit such distributions to real data, ASA has been developed, a global optimization technique, a 
superior variant of simulated annealing [24]. This was tested using EEG data in 1991 [13], using an early 
and not as flexible version of ASA, very fast reannealing (VFSR) [24]. Here, this is tested on more 
refined EEG using more sensitive CMI to portray results of the fits [20]. 

ASA [23] fits short-time probability distributions to observed data, using a maximum likelihood technique 
on the Lagrangian. This algorithm has been developed to fit observed data to a theoretical cost function 
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over a D-dimensional parameter space [24], adapting for varying sensitivities of parameters during the fit. 
Appendix A contains details of ASA relevant to its use in this paper. 

2.4. Complementary Research 

2.4.1. Chaos 

Given the context of studies in complex nonlinear systems [60], the question can be asked: What if EEG 
has chaotic mechanisms that overshadow the above stochastic considerations? The real issue is whether 
the scatter in data can be distinguished between being due to noise or chaos [61]. In this regard, several 
studies have been proposed with regard to comparing chaos to simple filtered (colored) noise [60,62]. 
Since the existence of multiplicative noise in neocortical interactions has been derived, then the previous 
references must be generalized, and further investigation is required to decide whether EEG scatter can be 
distinguished from multiplicative noise. 

A recent study with realistic EEG wave equations strongly suggests that if chaos exists in a deterministic 
limit, it does not survive in macroscopic stochastic neocortex [63]. I.e., it is important to include 
stochastic aspects, as arise from the statistics of synaptic and columnar interactions, in any realistic 
description of macroscopic neocortex. 

2.4.2. Other Systems 

Experience using ASA on such multivariate nonlinear stochastic systems has been gained by similar 
applications of the approach used for SMNI. 

From 1986-1989, these methods of mathematical physics were utilized by a team of scientists and officers 
to develop mathematical comparisons of Janus computer combat simulations with exercise data from the 
National Training Center (NTC), developing a testable theory of combat successfully baselined to 
empirical data [59,64-68]. 

This methodology has been applied to financial markets [21,69-71], developing specific trading rules for 
S&P 500 to demonstrate the robustness of these mathematical and numerical algorithms. 
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3. MATHEMATICAL DEVELOPMENT 

Fitting a multivariate nonlinear stochastic model to data is a necessary, but not sufficient procedure in 
developing new diagnostic software. Even an accurate model fit well to real data may not be immediately 
useful to clinicians and experimental researchers. To fill this void, the powerful intuitive basis of the 
mathematical physics used to develop SMNI has been utilized to describe the model in terms of rigorous 
CMI that provide an immediate intuitive portrait of the EEG data, faithfully describing the neocortical 
system being measured. The CMI give an enhanced signal over the raw data, and give some insights into 
the underlying columnar interactions. 

3.1. CMI, Information, Energy 

In the first SMNI papers, it was noted that this approach permitted the calculation of a true nonlinear 
nonequilibrium "information" entity at columnar scales. With reference to a steady state P(M) for a 
short-time Gaussian-Markovian conditional probability distribution P of variables M, when it exists, an 
analytic definition of the information gain T in state P(M) over the entire neocortical volume is defined 
by [72,73] 

f [P] = {■■■{ DM P hi(PAP) , DM = (2xg 2 At)- in f\ (2^g 2 s Aty m dM s , (1) 

J J s=l 

where a path integral is defined such that all intermediate-time values of M appearing in the folded short- 
time distributions P are integrated over. This is quite general for any system that can be described as 
Gaussian-Markovian [74], even if only in the short-time limit, e.g., the SMNI theory. 

As time evolves, the distribution likely no longer behaves in a Gaussian manner, and the apparent 
simplicity of the short-time distribution must be supplanted by numerical calculations. The Feynman 
Lagrangian is written in the midpoint discretization, for a specific macrocolumn corresponding to 

M(t s )= l -[M(t s+l ) + M(t s )]. (2) 

This discretization defines a covariant Lagrangian L F that possesses a variational principle for arbitrary 
noise, and that explicitly portrays the underlying Riemannian geometry induced by the metric tensor gcc, 
calculated to be the inverse of the covariance matrix g GG . Using the Einstein summation convention, 
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/>=[••• f DMexp(-£ AtL Fs ], 

DM = g^dxAty^flgll 2 ft (2xAtT m dM? , 

s=l G=\ 

\ dM G £ AM G , Mq = Ml , M G +l = M G , 
J 1=1 

Lp = - (dM G /dt - h G )g GG idM G '/dt - h G ') + - h G . G + R/6 - V , 
(■■■) 

K KG dM G ' 
Sgg' = (g GG ')~ l > 

g s [M G (t s ), t s ] = det(g GG <), , g s+ = g s [M G +i ,t s ] , 

h G ;G = h G J + r F GF h G = g-^h G x G , 

T F K = g LF [JK, L] = g LF {g JLX + g KLJ - g jk,l) , 
R = g JL RjL = g JL g JK R FJ KL , 

RfJKL = ~ (gFKJL ~ gjK,FL ~ 8FL,JK + 8jL,FK) + gMN^FK^JL ~ ^FL^Jk) > (3) 

where R is the Riemannian curvature, and the discretization is explicitly denoted in the mesh of M G S by i. 
If M is a field, e.g., also dependent on a spatial variable x discretized by v, then the variables M s is 
increased to M Gv , e.g., as prescribed for the macroscopic neocortex. The term R/6 in L F includes a 
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contribution of RI12 from the WKB approximation to the same order of (Atf 2 [75]. 

A prepoint discretization for the same probability distribution P gives a much simpler algebraic form, 

M(t s ) = M(t s ) , 

L= l - (dM G /dt - g G )g GG idM G '/dt - g G ) - V , (4) 

but the Lagrangian L so specified does not satisfy a variational principle useful for moderate to large 
noise; its associated variational principle only provides information useful in the weak-noise limit [76]. 
The neocortex presents a system of moderate noise. Still, this prepoint-discretized form has been quite 
useful in all systems examined thus far, simply requiring a somewhat finer numerical mesh. Note that 
although integrations are indicated over a huge number of independent variables, i.e., as denoted by 
dM Gv , the physical interpretation afforded by statistical mechanics makes these systems mathematically 
and physically manageable. 

It must be emphasized that the output need not be confined to complex algebraic forms or tables of 
numbers. Because L F possesses a variational principle, sets of contour graphs, at different long-time 
epochs of the path-integral of P, integrated over all its variables at all intermediate times, give a visually 
intuitive and accurate decision aid to view the dynamic evolution of the scenario. For example, as given 
in Table 1, this Lagrangian approach permits a quantitative assessment of concepts usually only loosely 
defined. These physical entities provide another form of intuitive, but quantitatively precise, presentation 
of these analyses [68,77]. In this study, the above canonical momenta are referred to canonical momenta 
indicators (CMI). 

In a prepoint discretization, where the Riemannian geometry is not explicit (but calculated in the first 
SMNI papers), the distributions of neuronal activities p a . is developed into distributions for activity under 
an electrode site P in terms of a Lagrangian L and threshold functions F G , 



P = ft P G [M G (r; t + T)\M G (r'; t)} = £ 8 

G <J; 



N 



Up,, 

j 



- n (2nxg GG T m exp(-NrL G ) = {2mY m g m exp(-TVrL) , 

G 
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Concept Lagrangian equivalent 



Momentum II = ^— — 

d{dM G ldt) 

Mass Sgg' = 



Force 



F = ma SL F = = 



d(dM G /dt)d(dM G '/dt) 



dM G dt d(dMG/dt) 



TABLE 1. Descriptive concepts and their mathematical equivalents in a Lagrangian repre- 
sentation. 



L = T-V , T = (2N)- l (M G -g G )g GG iM G '-g G ), 



g G = -T- l (M G + N G t a nhF G ) , g GG = (gGG') 1 = S G t~ l N G sech 2 F G , g = det( gGG '), 



t/G JG\ T \G\ 
pG _ V V G' 1 G' 



T™ = a.Q'N G ' + - Aq)m g ' + a G ?N* G ' + - A^M^' + a G ?N iG ' + - A*J GI M* G ' , 



J§ = \A% + B$ , A% = A\ E = A] 1 = = B] E = Bf =0 , af= l 2 Af + Bf, (5) 

where no sum is taken over repeated \G\, A G , and are macrocolumnar-averaged interneuronal synaptic 
efficacies, Vq> and </> G , are averaged means and variances of contributions to neuronal electric 
polarizations, N G are the numbers of excitatory and inhibitory neurons per minicolumn, and the variables 
associated with M G , M tG and M* G relate to multiple scales of activities from minicolumns, between 
minicolumns within regions, and across regions, resp. The nearest-neighbor interactions V can be 
modeled in greater detail by a stochastic mesoscopic neural network [14]. The SMNI papers give more 
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detail on this derivation. 

In terms of the above variables, an energy or Hamiltonian density H can be defined, 

H = T + V , (6) 

in terms of the M G and n G variables, and the path integral is now defined over all the DM G as well as 
over the DYl G variables. 

3.2. Nonlinear String Model 

A mechanical-analog model the string model, is derived explicitly for neocortical interactions using 
SMNI [12]. In addition to providing overlap with current EEG paradigms, this defines a probability 
distribution of firing activity, which can be used to further investigate the existence of other nonlinear 
phenomena, e.g., bifurcations or chaotic behavior, in brain states. 

Previous SMNI studies have detailed that maximal numbers of attractors lie within the physical firing 
space of M G , consistent with experimentally observed capacities of auditory and visual STM, when a 
"centering" mechanism is enforced by shifting background conductivities of synaptic interactions, 
consistent with experimental observations under conditions of selective attention [4,6,15,16,78]. This 
leads to an effect of having all attractors of the short-time distribution lie along a diagonal line in M G 
space, effectively defining a narrow parabolic trough containing these most likely firing states. This 
essentially collapses the 2 dimensional M G space down to a 1 dimensional space of most importance. 

Thus, the predominant physics of short-term memory and of (short-fiber contribution to) EEG phenomena 
takes place in a narrow "parabolic trough" in M G space, roughly along a diagonal line [4]. Here, G 
represents E or /, M E represents contributions to columnar firing from excitatory neurons, and M 1 
represents contributions to columnar firing from inhibitory neurons. The object of interest within a short 
refractory time, r, approximately 5 to 10 msec, is the Lagrangian L for a mesocolumn, detailed above. 
tL can vary by as much as a factor of 10 5 from the highest peak to the lowest valley in M G space. 
Therefore, it is reasonable to assume that a single independent firing variable might offer a crude 
description of this physics. Furthermore, the scalp potential <1> can be considered to be a function of this 
firing variable. (Here, "potential" refers to the electric potential, not the potential term in the Lagrangian 
above.) In an abbreviated notation subscripting the time-dependence, 
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<S> t - « »= <&{Mf , M\) - a{Mf- « M E ») + b(Af{- « M 1 ») , (7) 

where a and b are constants, and <K <i> » and « M G » represent typical minima in the trough. In the 
context of fitting data to the dynamic variables, there are three effective constants, { a, b, <f> } , 

<£>,-</> = aMf + bMj . (8) 

The mesoscopic probability distributions, P, are scaled and aggregated over this columnar firing space to 
obtain the macroscopic probability distribution over the scalp-potential space: 

Pq[&] = J dM E dM'P[M E , M']S[<S> - <J>'(M £ , M 7 )] - (9) 

In the prepoint discretization, the postpoint M G (t + At) moments are given by 
m =< <£> v - <j> >= a < M E >+b < M 1 >= ag E + bg 1 , 

a 2 (3> v - <t>? > ~ < <S>v ~ <t> > 2 = a 2 g EE + b 2 g n , (10) 

where the M G -space drifts g G , and diffusions g GG , are given above. Note that the macroscopic drifts and 
diffusions of the <I>'s are simply linearly related to the mesoscopic drifts and diffusions of the M G 's. For 
the prepoint M G (t) firings, the same linear relationship in terms of { </>,a,b} is assumed. 

For the prepoint M E (t) firings, advantage is taken of the parabolic trough derived for the STM 
Lagrangian, and 

M I (t) = cM E (t) , (11) 

where the slope c is set to the close approximate value determined by a detailed calculation of the 
centering mechanism [15], 

A E E M E - Af M 1 - . (12) 

This permits a complete transformation from M G variables to <E> variables. 

Similarly, as appearing in the modified threshold factor F G , each regional influence from electrode site fi 
acting at electrode site v, given by afferent firings M* £ , is taken as 
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Mf_, v = d v M E M (t-T M ^ v ), (13) 

where d v are constants to be fitted at each electrode site, and T '_> v are the delay times estimated above 
for inter-electrode signal propagation, based on anatomical knowledge of the neocortex and of velocities 
of propagation of action potentials of long-ranged fibers, typically on the order of one to several multiples 
of t= 5 msec. Some terms in which d directly affects the shifts of synaptic parameters when 
calculating the centering mechanism also contain long-ranged efficacies (inverse conductivities) B*£ . 
Therefore, the latter were kept fixed with the other electrical-chemical synaptic parameters during these 
fits. Future fits will experiment taking the J's as parameters. 

This defines the conditional probability distribution for the measured scalp potential <J> r , 

1 

P v [® v (t + At)\® v (t)] = - exp(-L v At) , 

{lna l Aty u 

L v =^(<i> v -m) 2 . (14) 
The probability distribution for all electrodes is taken to be the product of all these distributions: 

P = UPv , L = ZL V . (15) 



Note that the belief in the dipole or nonlinear-string model is being invoked. The model SMNI, derived 
for P[M G (t + At)\M G (t)], is for a macrocolumnar-averaged minicolumn; hence it is expected to be a 
reasonable approximation to represent a macrocolumn, scaled to its contribution to <iv Hence, L is used 
to represent this macroscopic regional Lagrangian, scaled from its mesoscopic mesocolumnar counterpart 
L. However, the above expression for P v uses the dipole assumption to also use this expression to 
represent several to many macrocolumns present in a region under an electrode: A macrocolumn has a 
spatial extent of about a mm. It is often argued that typically several macrocolumns firing coherently 
account for the electric potentials measured by one scalp electrode [79]. Then, this model is being tested 
to see if the potential will scale to a representative macrocolumn. The results presented here seem to 
confirm that this approximation is in fact quite reasonable. 
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The parabolic trough described above justifies a form 

A? r 

P = {2na 2 My m exp(- ^ J c£x L ) , 

L* = | ia<J>/3fl 2 + I I3<l>/3xl 2 + | IOI 2 + F(<D) , (16) 

where F(<1>) contains nonlinearities away from the trough, a 2 is on the order of N given the derivation of 
L above, and the integral over x is taken over the spatial region of interest. In general, there also will be 
terms linear in d<frldt and in d<frldx. (This corrects a typo that appears in several papers [12,13,17,19], 
incorrectly giving the order of a as 1/N . The order N was first derived [13] from a being expressed as 
a sum over the E and / diffusions given above.) 

Previous calculations of EEG phenomena [5], show that the short- fiber contribution to the a frequency 
and the movement of attention across the visual field are consistent with the assumption that the EEG 
physics is derived from an average over the fluctuations of the system, e.g., represented by a in the above 
equation. I.e., this is described by the Euler-Lagrange equations derived from the variational principle 
possessed by (essentially the counterpart to force equals mass times acceleration), more properly by 
the "midpoint-discretized" Feynman L^, with its Riemannian terms [2,3,11]. 

3.3. CMI Sensitivity 

In the SMNI approach, "information" is a concept well defined in terms of the probability eigenfunctions 
of electrical-chemical activity of this Lagrangian. The path-integral formulation presents an accurate 
intuitive picture of an initial probability distribution of patterns of firings being filtered by the 
(exponential of the) Lagrangian, resulting in a final probability distribution of patterns of firing. 

The utility of a measure of information has been noted by other investigators. For example, there have 
been attempts to use information as an index of EEG activity [80,81]. These attempts have focused on the 
concept of "mutual information" to find correlations of EEG activity under different electrodes. Other 
investigators have looked at simulation models of neurons to extract information as a measure of 
complexity of information processing [82]. Some other investigators have examined the utility of the 
energy density as a viable measure of information processing STM paradigms [83]. 
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The SMNI approach at the outset recognizes that, for most brain states of late latency, at least a subset of 
regions being measured by several electrodes is indeed to be considered as one system, and their 
interactions are to be explicated by mathematical or physical modeling of the underlying neuronal 
processes. Then, it is not relevant to compare joint distributions over a set of electrodes with marginal 
distributions over individual electrodes. 

In the context of the present SMNI study, the CMI transform covariantly under Riemannian 
transformations, but are more sensitive measures of neocortical activity than other invariants such as the 
energy density, effectively the square of the CMI, or the information which also effectively is in terms of 
the square of the CMI (essentially path integrals over quantities proportional to the energy times a factor 
of an exponential including the energy as an argument). Neither the energy or the information give details 
of the components as do the CMI. EEG is measuring a quite oscillatory system and the relative signs of 
such activity are quite important. The information and energy densities are calculated and printed out 
after ASA fits along with the CMI. 

4. SMNI APPLICATIONS TO INDIVIDUAL EEG 
4.1. Data 

EEG spontaneous and evoked potential (EP) data from a multi-electrode array under a variety of 
conditions was collected at several centers in the United States, sponsored by the National Institute on 
Alcohol Abuse and Alcoholism (NIAAA) project. The earlier 1991 study used only averaged EP 
data [84]. These experiments, performed on carefully selected sets of subjects, suggest a genetic 
predisposition to alcoholism that is strongly correlated to EEG AEP responses to patterned targets. 

It is clear that the author is not an expert in the clinical aspects of these alcoholism studies. It suffices for 
this study that the data used is clean raw EEG data, and that these SMNI, CMI, and ASA techniques can 
and should be used and tested on other sources of EEG data as well. 

Each set of results is presented with 6 figures, labeled as [{alcoholic I control}, {stimulus 1 I match I no- 
match}, subject, {potential I momenta}], abbreviated to {a I c}_{1 I m I n}_subject.{pot I mom} where 
match or no- match was performed for stimulus 2 after 3.2 sec of a presentation of stimulus 1 [84]. Data 
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includes 10 trials of 69 epochs each between 150 and 400 msec after presentation. For each subject run, 
after fitting 28 parameters with ASA, epoch by epoch averages are developed of the raw data and of the 
multivariate SMNI CMI. It was noted that much poorer fits were achieved when the "centering" 
mechanism [4,6] , driving multiple attractors into the physical firing regions bounded by M <±N , was 
turned off and the denominators in F G were set to constants, confirming the importance of using the full 
SMNI model. All stimuli were presented for 300 msec. For example, c_m_co2c0000337.pot is a figure. 
Note that the subject number also includes the {alcoholic I control} tag, but this tag was added just to aid 
sorting of files (as there are contribution from co2 and co3 subjects). Each figure contains graphs 
superimposed for 6 electrode sites (out of 64 in the data) which have been modeled by SMNI using a 
circuitry given in Table 2 of frontal sites (F3 and F4) feeding temporal (sides of head T7 and T8) and 
parietal (top of head P7 and P8) sites, where odd-numbered (even-numbered) sites refer to the left (right) 
brain. 

4.2. ASA Tuning 

A three-stage optimization was performed for each of 60 data sets in {a_n, a_m, a_n, c_l, c_m, c_n} of 
10 subjects. As described previously, each of these data sets had 3-5 parameters for each SMNI 
electrode- site model in {F3, F4, T7, T8, P7, P8}, i.e., 28 parameters for each of the optimization runs, to 
be fit to over 400 pieces of potential data. 

For each state generated in the fit, prior to calculating the Lagrangian, tests were performed to ensure that 
all short-ranged and long-ranged firings lay in their physical boundaries. When this test failed, the 
generated state was simply excluded from the parameter space for further consideration. This is a 
standard simulated- annealing technique to handle complex constraints. 

4.2.1. First-Stage Optimization 

The first-stage optimization used ASA, version 13.1, tuned to give reasonable performance by examining 
intermediate results of several sample runs in detail. Table 3 gives those OPTIONS changed from their 
defaults. (See Appendix A for a discussion of ASA OPTIONS.) 

The ranges of the parameters were decided as follows. The ranges of the strength of the long-range 
connectivities d v were from to 1. The ranges of the {a, b, c} parameters were decided by using 
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Site Contributions From Time Delays (3.906 msec) 



F3 - 

F4 - 

T7 F3 1 

T7 T8 1 

T8 F4 1 

T8 T7 1 

P7 T7 1 

P7 P8 1 

P7 F3 2 

P8 T8 1 

P8 P7 1 

P8 F4 2 



TABLE 2. Circuitry of long-ranged fibers across most relevant electrode sites and their 
assumed time-delays in units of 3.906 msec. 

minimum and maximum values of M G and M* G firings to keep the potential variable within the 
minimum and maximum values of the experimentally measured potential at each electrode site. 

Using the above ASA OPTIONS and ranges of parameters, it was found that typically within several 
thousand generated states, the global minimum was approached within at least one or two significant 
figures of the effective Lagrangian (including the prefactor). This estimate was based on final fits 
achieved with hundreds of thousands of generated states. Runs were permitted to continue for 50,000 
generated states. This very rapid convergence in these 30-dimensional spaces was partially due to the 
invocation of the centering mechanism. 

Some tests with SMNI parameters off the diagonal in M G -space, as established by the centering 
mechanism, confirmed that ASA converged back to this diagonal, but requiring many more generated 
states. Of course, an examination of the Lagrangian shows this trivially, as noted in previous papers [3,4], 
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OPTIONS 


Default 


Stage 1 Use 


Limit_Acceptances 


10000 


25000 


Limit_Generated 


99999 


50000 


Cost_Precision 


1.0E-18 


1.0E-9 


Number Cost Samples 


5 


3 


Cost Parameter Scale Ratio 


1.0 


0.2 


Acceptance_Frequency_Modulus 


100 


25 


Generated_Frequency_Modulus 


10000 


10 


Reanneal_Cost 


1 


4 


Reanneal_Parameters 


1 





SMALL_FLOAT 


1.0E-18 


1.0E-30 


ASA_LIB 


FALSE 


TRUE 


QUENCH_COST 


FALSE 


TRUE 


QUENCH_PARAMETERS 


FALSE 


TRUE 


COST_FILE 


TRUE 


FALSE 


NO_PARAM_TEMP_TEST 


FALSE 


TRUE 


NO_COST_TEMP_TEST 


FALSE 


TRUE 


TIME_CALC 


FALSE 


TRUE 


AS A_PRINT_MORE 


FALSE 


TRUE 



TABLE 3. ASA OPTIONS changes from their defaults used in stage one optimization. 

wherein the Lagrangian values were on the order of 10 5 r _1 , compared to 10~ 2 -10~ 3 r~ l along the 
diagonal established by the centering mechanism. 

4.2.2. Second-Stage Optimization 

The second-stage optimization was invoked to minimize the number of generated states that would have 
been required if only the first-stage optimization were performed. Table 4 gives the changes made in the 
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OPTIONS from stage one for stage two. 



OPTIONS 



Stage 2 Changes 



Limit_Acceptances 



5000 



Limit_Generated 



10000 



User_Initial_Parameters 



TRUE 



User_Quench_Param_Scale[.] 30 



TABLE 4. ASA OPTIONS changes from their use in stage one for stage two optimization. 



The final stage-one parameters were used as the initial starting parameters for stage two. (At high 
annealing/quenching temperatures at the start of an SA run, it typically is not important as to what the 
initial values of the the parameters are, provided of course that they satisfy all constraints, etc.) The 
second-stage minimum of each parameter was chosen to be the maximum lower bound of the first-stage 
minimum and a 20% increase of that minimum. The second-stage maximum of each parameter was 
chosen to be the minimum upper bound of the first-stage maximum and a 20% decrease of that maximum. 

Extreme quenching was turned on for the parameters (not for the cost temperature), at values of the 
parameter dimension of 30, increased from 1 (for rigorous annealing). This worked very well, typically 
achieving the global minimum with 1000 generated states. Runs were permitted to continue for 10000 
generated states. 

4.2.3. Third-Stage Optimization 

The third-stage optimization used a quasi-local code, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) 
algorithm [85], to gain an extra 2 or 3 figures of precision in the global minimum. This typically took 
several hundred states, and runs were permitted to continue for 500 generated states. Constraints were 
enforced by the method of penalties added to the cost function outside the constraints. 

The BFGS code typically got stuck in a local minimum quite early if invoked just after the first-stage 
optimization. (There never was a reasonable chance of getting close to the global minimum using the 
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BFGS code as a first-stage optimizer.) These fits were much more efficient than those in a previous 1991 
study [13], where VFSR, the precursor code to ASA, was used for a long stage-one optimization which 
was then turned over to BFGS. 

4.3. Results 

Figs. 1-3 compares the CMI to raw data for an alcoholic subject for the a_l, a_m and a_n paradigms. 
Figs. 4-6 gives similar comparisons for a control subject for the c_l, c_m and c_n paradigms. The SMNI 
CMI give better signal to noise resolution than the raw data, especially comparing the significant 
matching tasks between the control and the alcoholic groups, e.g., the c_m and a_m paradigms. The CMI 
can be processed further as is the raw data, and also used to calculate "energy" and "information/entropy" 
densities. 
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FIG. 1. For the initial-stimulus a_l paradigm for alcoholic subject co2a0000364, plots are 
given of activities under 6 electrodes of the CMI in the upper figure, and of the electric 
potential in the lower figure. 
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FIG. 2. For the match second-stimulus a_m paradigm for alcoholic subject co2a0000364, 
plots are given of activities under 6 electrodes of the CMI in the upper figure, and of the 
electric potential in the lower figure. 
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FIG. 3. For the no-match second-stimulus a_n paradigm for alcoholic subject co2a0000364, 
plots are given of activities under 6 electrodes of the CMI in the upper figure, and of the 
electric potential in the lower figure. 
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FIG. 4. For the initial-stimulus c_l paradigm for control subject co2c0000337, plots are 
given of activities under 6 electrodes of the CMI in the upper figure, and of the electric 
potential in the lower figure. 
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FIG. 5. For the match second-stimulus c_m paradigm for control subject co2c0000337, plots 
are given of activities under 6 electrodes of the CMI in the upper figure, and of the electric 
potential in the lower figure. 
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FIG. 6. For the no-match second-stimulus c_n paradigm for control subject co2c0000337, 
plots are given of activities under 6 electrodes of the CMI in the upper figure, and of the 
electric potential in the lower figure. 
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Similar results are seen for other subjects. A compressed tarfile for 10 control and 10 alcoholic subjects, 
a total of 120 PostScript figures, can be retrieved via WWW from 
http://www.ingber.eom/MISC.DIR/smni97_eeg_cmi.tar.Z, or as file smni97_eeg_cmi.tar.Z via FTP from 
ftp.ingber.com in the MISC.DIR directory. 

5. CONCLUSIONS 

5.1. CMI and Linear Models 

It is clear that the CMI follow the measured potential variables closely. In large part, this is due to the 
prominent attractors near the firing states M G being close to their origins, resulting in moderate threshold 
functions F G in these regions. This keeps the term in the drifts proportional to tanh F G near its lowest 
values, yielding values of the drifts on the order of the time derivatives of the potentials. The diffusions, 
proportional to sechF G , also do not fluctuate to very large values. 

However, when the feedback among potentials under electrode sites are strong, leading to enhanced 
(nonlinear) changes in the drifts and diffusions, then these do cause relatively largest signals in the CMI 
relative to those appearing in the raw potentials. Thus, these effects are strongest in the c_m sets of data, 
where the control (normal) subjects demonstrate more intense circuitry interactions among electrode sites 
during the matching paradigm. 

These results also support independent studies of primarily long-ranged EEG activity, that have concluded 
that EEG many times appears to demonstrate quasi-linear interactions [39,86]. However, it must be noted 
that this is only true within the confines of an attractor of highly nonlinear short-ranged columnar 
interactions. It requires some effort, e.g., global optimization of a robust multivariate stochastic nonlinear 
system to achieve finding this attractor. Theoretically, using the SMNI model, this is performed using the 
ASA code. Presumably, the neocortical system utilizes neuromodular controls to achieve this attractor 
state [55,78], as suggested in early SMNI studies [3,4]. 

5.2. CMI Features 

Essential features of the SMNI CMI approach are: (a) A realistic SMNI model, clearly capable of 
modeling EEG phenomena, is used, including both long-ranged columnar interactions across electrode 
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sites and short-ranged columnar interactions under each electrode site, (b) The data is used raw for the 
nonlinear model, and only after the fits are moments (averages and variances) taken of the derived CMI 
indicators; this is unlike other studies that most often start with averaged potential data, (c) A novel and 
sensitive measure, CMI, is used, which has been shown to be successful in enhancing resolution of signals 
in another stochastic multivariate time series system, financial markets [21,22]. As was performed in 
those studies, future SMNI projects can similarly use recursive ASA optimization, with an inner-shell 
fitting CMI of subjects' EEG, embedded in an outer-shell of parameterized customized clinician's Al-type 
rules acting on the CMI, to create supplemental decision aids. 

Canonical momenta offers an intuitive yet detailed coordinate system of some complex systems amenable 
to modeling by methods of nonlinear nonequilibrium multivariate statistical mechanics. These can be 
used as reasonable indicators of new and/or strong trends of behavior, upon which reasonable decisions 
and actions can be based, and therefore can be be considered as important supplemental aids to other 
clinical indicators. 

5.3. CMI and Source Localization 

Global ASA optimization, fitting the nonlinearities inherent in the synergistic contributions from short- 
ranged columnar firings and from long-ranged fibers, makes it possible to disentangle their contributions 
to some specific electrode circuitries among columnar firings under regions separated by cm, at least to 
the degree that the CMI clearly offer superior signal to noise than the raw data. Thus this paper at least 
establishes the utility of the CMI for EEG analyses, which can be used to complement other EEG 
modeling techniques. In this paper, a plausible circuitry was first hypothesized (by a group of experts), 
and it remains to be seen just how many more electrodes can be added to such studies with the goal being 
to have ASA fits determine the optimal circuitry. 

It is clear that future SMNI projects should integrate current modeling technologies together with the 
CMI. For example, one approach for adding CMI to this set of tools would be to use source-localization 
techniques to generate simulated macrocolumnar cortical potentials (effectively a best fit of source- 
generated potentials to raw scalp data) to determine the CMI. The CMI then can provide further 
disentanglement of short-ranged and long-ranged interactions to determine most likely circuit dynamics. 
Since source localization often is a non-unique process, this may provide an iterative approach to aid finer 
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source localization. That is, SMNI is a nonlinear stochastic model based on realistic neuronal 
interactions, and it is reasonable to assume that the derived CMI add much additional information to these 
localization analyses. 

5.4. SMNI Features 

Sets of EEG data taken during selective attention tasks have been fit using parameters either set to 
experimentally observed values, or have been fit within experimentally observed values. The ranges of 
columnar firings are consistent with a centering mechanism derived for STM in earlier papers. 

These results, in addition to their importance in reasonably modeling EEG with SMNI, also have a deeper 
theoretical importance with respect to the scaling of neocortical mechanisms of interaction across 
disparate spatial scales and behavioral phenomena: As has been pointed out previously, SMNI has given 
experimental support to the derivation of the mesoscopic probability distribution, illustrating common 
forms of interactions between their entities, i.e., neurons and columns of neurons, respectively. The 
nonlinear threshold factors are defined in terms of electrical-chemical synaptic and neuronal parameters 
all lying within their experimentally observed ranges. It also was noted that the most likely trajectories of 
the mesoscopic probability distribution, representing averages over columnar domains, give a description 
of the systematics of macroscopic EEG in accordance with experimental observations. It has been 
demonstrated that the macroscopic regional probability distribution can be derived to have same 
functional form as the mesoscopic distribution, where the macroscopic drifts and diffusions of the 
potentials described by the <I>'s are simply linearly related to the (nonlinear) mesoscopic drifts and 
diffusions of the columnar firing states given by the M G 's. Then, this macroscopic probability 
distribution gives a reasonable description of experimentally observed EEG. 

The theoretical and experimental importance of specific scaling of interactions in the neocortex has been 
quantitatively demonstrated on individual brains. The explicit algebraic form of the probability 
distribution for mesoscopic columnar interactions is driven by a nonlinear threshold factor of the same 
form taken to describe microscopic neuronal interactions, in terms of electrical-chemical synaptic and 
neuronal parameters all lying within their experimentally observed ranges; these threshold factors largely 
determine the nature of the drifts and diffusions of the system. This mesoscopic probability distribution 
has successfully described STM phenomena and, when used as a basis to derive the most likely 
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trajectories using the Euler-Lagrange variational equations, it also has described the systematics of EEG 
phenomena. In this paper, the mesoscopic form of the full probability distribution has been taken more 
seriously for macroscopic interactions, deriving macroscopic drifts and diffusions linearly related to sums 
of their (nonlinear) mesoscopic counterparts, scaling its variables to describe interactions among regional 
interactions correlated with observed electrical activities measured by electrode recordings of scalp EEG, 
with apparent success. These results give strong quantitative support for an accurate intuitive picture, 
portraying neocortical interactions as having common algebraic or physics mechanisms that scale across 
quite disparate spatial scales and functional or behavioral phenomena, i.e., describing interactions among 
neurons, columns of neurons, and regional masses of neurons. 

5.5. Summary 

SMNI is a reasonable approach to extract more "signal" out of the "noise" in EEG data, in terms of 
physical dynamical variables, than by merely performing regression statistical analyses on collateral 
variables. To learn more about complex systems, inevitably functional models must be formed to 
represent huge sets of data. Indeed, modeling phenomena is as much a cornerstone of 20th century 
science as is collection of empirical data [87]. 

It seems reasonable to speculate on the evolutionary desirability of developing Gaussian-Markovian 
statistics at the mesoscopic columnar scale from microscopic neuronal interactions, and maintaining this 
type of system up to the macroscopic regional scale. I.e., this permits maximal processing of 
information [73]. There is much work to be done, but modern methods of statistical mechanics have 
helped to point the way to promising approaches. 

APPENDIX A: ADAPTIVE SIMULATED ANNEALING (ASA) 
1. General Description 

Simulated annealing (SA) was developed in 1983 to deal with highly nonlinear problems [88], as an 
extension of a Monte-Carlo importance-sampling technique developed in 1953 for chemical physics 
problems. It helps to visualize the problems presented by such complex systems as a geographical terrain. 
For example, consider a mountain range, with two "parameters," e.g., along the North-South and 
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East- West directions, with the goal to find the lowest valley in this terrain. SA approaches this problem 
similar to using a bouncing ball that can bounce over mountains from valley to valley. Start at a high 
"temperature," where the temperature is an SA parameter that mimics the effect of a fast moving particle 
in a hot object like a hot molten metal, thereby permitting the ball to make very high bounces and being 
able to bounce over any mountain to access any valley, given enough bounces. As the temperature is 
made relatively colder, the ball cannot bounce so high, and it also can settle to become trapped in 
relatively smaller ranges of valleys. 

Imagine that a mountain range is aptly described by a "cost function." Define probability distributions of 
the two directional parameters, called generating distributions since they generate possible valleys or 
states to explore. Define another distribution, called the acceptance distribution, which depends on the 
difference of cost functions of the present generated valley to be explored and the last saved lowest valley. 
The acceptance distribution decides probabilistically whether to stay in a new lower valley or to bounce 
out of it. All the generating and acceptance distributions depend on temperatures. 

In 1984 [89], it was established that SA possessed a proof that, by carefully controlling the rates of 
cooling of temperatures, it could statistically find the best minimum, e.g., the lowest valley of our 
example above. This was good news for people trying to solve hard problems which could not be solved 
by other algorithms. The bad news was that the guarantee was only good if they were willing to run SA 
forever. In 1987, a method of fast annealing (FA) was developed [90], which permitted lowering the 
temperature exponentially faster, thereby statistically guaranteeing that the minimum could be found in 
some finite time. However, that time still could be quite long. Shortly thereafter, Very Fast Simulated 
Reannealing (VFSR) was developed [24], now called Adaptive Simulated Annealing (ASA), which is 
exponentially faster than FA. 

ASA has been applied to many problems by many people in many disciplines [26,27,91]. The feedback 
of many users regularly scrutinizing the source code ensures its soundness as it becomes more flexible 
and powerful. The code is available via the world-wide web (WWW) as http://www.ingber.com/ which 
also can be accessed anonymous FTP from ftp.ingber.com. 
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2. Mathematical Outline 

ASA considers a parameter a\ in dimension i generated at annealing-time k with the range 

aie[Ai,Bi], (A.l) 
calculated with the random variable y l , 

4+1 = 4 + y\ B i ~ A i) > 

y'e[-l,l]. (A.2) 



The generating function gr(y) is defined, 

sAy) = H aw.rwitw,) " ^ • <A3) 

where the subscript / on T, specifies the parameter index, and the ^-dependence in Tj(k) for the annealing 
schedule has been dropped for brevity. Its cumulative probability distribution is 



G T (y) = J • • • J dy' x ■ ■ ■ dy' D g T (y) =UGW) . 



-l -l 



,. ,■ _ 1 sgn(y') ln(l + ly'l/r,-) 
G r (y ) - 2 + 2 ln(l + Vfj) ' (A ' 4) 

y ! is generated from a m ! from the uniform distribution 

w'e£/[0, 1] , 



y =sgn(u i - l -)T t [(l + \ITif ui -" - 1] . (A.5) 



It is straightforward to calculate that for an annealing schedule for T t 

T i (k) = T Qi ex V (-c i k VD ), (A.6) 
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a global minima statistically can be obtained. I.e., 
oo oo d \ \ 

L«*»L[n^-ij-]T=oo. (A.7) 

Control can be taken over c,-, such that 

T fi = T Qi exp(-m ; ) when k f = exp n,- , 

c; = m ; exp(-n,/D) , (A.8) 
where m, and ra ; can be considered "free" parameters to help tune ASA for specific problems. 

3. ASA OPTIONS 

ASA has over 100 OPTIONS available for tuning. A few are most relevant to this project. 
3.1. Reannealing 

Whenever doing a multi-dimensional search in the course of a complex nonlinear physical problem, 
inevitably one must deal with different changing sensitivities of the a 1 in the search. At any given 
annealing-time, the range over which the relatively insensitive parameters are being searched can be 
"stretched out" relative to the ranges of the more sensitive parameters. This can be accomplished by 
periodically rescaling the annealing-time k, essentially reannealing, every hundred or so acceptance- 
events (or at some user-defined modulus of the number of accepted or generated states), in terms of the 
sensitivities s t calculated at the most current minimum value of the cost function, C, 

Si = dClda 1 . (A.9) 

In terms of the largest s,- = s max , a default rescaling is performed for each k t of each parameter dimension, 
whereby a new index k' t is calculated from each k h 

kj > k : , 



T ik' — T ilc(s max / S i) , 
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k\ = (in(r ;0 /r !r )/ Cl ) D . (A. io) 

r,o is set to unity to begin the search, which is ample to span each parameter dimension. 

3.2. Quenching 

Another adaptive feature of ASA is its ability to perform quenching in a methodical fashion. This is 
applied by noting that the temperature schedule above can be redefined as 

Ti(ki) = T 0i exp(-c t kf' /D ) , 

Ci = nti expi-riiQi/D) , (A.l 1) 

in terms of the "quenching factor" Q t . The sampling proof fails if <2; > 1 as 

D 

£ ft Vk Q -' D = £ Vk Q - < oo . (A. 12) 

* k 

This simple calculation shows how the "curse of dimensionality" arises, and also gives a possible way of 
living with this disease. In ASA, the influence of large dimensions becomes clearly focussed on the 
exponential of the power of k being IAD, as the annealing required to properly sample the space becomes 
prohibitively slow. So, if resources cannot be committed to properly sample the space, then for some 
systems perhaps the next best procedure may be to turn on quenching, whereby Qj can become on the 
order of the size of number of dimensions. 

The scale of the power of 1/D temperature schedule used for the acceptance function can be altered in a 
similar fashion. However, this does not affect the annealing proof of ASA, and so this may used without 
damaging the sampling property. 

3.3. Self Optimization 

If not much information is known about a particular system, if the ASA defaults do not seem to work very 
well, and if after a bit of experimentation it still is not clear how to select values for some of the ASA 
OPTIONS, then the SELF_OPTIMIZE OPTIONS can be very useful. This sets up a top level search on 
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the ASA OPTIONS themselves, using criteria of the system as its own cost function, e.g., the best 
attained optimal value of the system's cost function (the cost function for the actual problem to be solved) 
for each given set of top level OPTIONS, or the number of generated states required to reach a given 
value of the system's cost function, etc. Since this can consume a lot of CPU resources, it is 
recommended that only a few ASA OPTIONS and a scaled down system cost function or system data be 
selected for this OPTIONS. 

Even if good results are being attained by ASA, SELF_OPTIMIZE can be used to find a more efficient set 
of ASA OPTIONS. Self optimization of such parameters can be very useful for production runs of 
complex systems. 

3.4. Parallel Code 

It is quite difficult to directly parallelize an SA algorithm [26], e.g., without incurring very restrictive 
constraints on temperature schedules [92], or violating an associated sampling proof [93]. However, the 
fat tail of ASA permits parallelization of developing generated states prior to subjecting them to the 
acceptance test [14]. The ASA_PARALLEL OPTIONS provide parameters to easily parallelize the code, 
using various implementations, e.g., PVM, shared memory, etc. 

The scale of parallelization afforded by ASA, without violating its sampling proof, is given by a typical 
ratio of the number of generated to accepted states. Several experts in parallelization suggest that massive 
parallelization e.g., on the order of the human brain, may take place quite far into the future, that this 
might be somewhat less useful for many applications than previously thought, and that most useful scales 
of parallelization might be on scales of order 10 to 1000. Depending on the specific problem, such scales 
are common in ASA optimization, and the ASA code can implement such parallelization. 
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